feat: sglang guided decoding support#6620
Conversation
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
|
👋 Hi jellysnack! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis change refactors tokenizer flag handling by replacing Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 391-400: The helper _get_guided_decoding_params currently only
reads guided_decoding["json"] and will ignore requests using
guided_decoding["json_schema"]; update _get_guided_decoding_params to accept
both keys (prefer "json_schema" if present, fall back to "json") and return
{"json_schema": json.dumps(value)} when either is supplied so schema constraints
are preserved; ensure the existing type check on guided_decoding
(isinstance(..., dict)) remains and return {} when neither key exists.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/src/dynamo/sglang/args.pycomponents/src/dynamo/sglang/register.pycomponents/src/dynamo/sglang/request_handlers/handler_base.pycomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.pycomponents/src/dynamo/sglang/request_handlers/llm/diffusion_handler.pycomponents/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
|
Hey @jellysnack - thank you for the PR. Sorry have been overall very busy. This PR is on my TODO list to review this week |
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
|
just a gentle ping on this PR |
|
@ishandhanani Could you take a look when you get a chance? Thanks! |
|
/ok to test a18b513 |
|
@jellysnack Thanks for contributing. Gentle reminder to add a reproducer and output before and after fix. |
|
sorry for jumping in. Need to refresh this PR to pull a fix for Allure reporting. |
|
/ok to test ac19772 |
Done |
Overview:
Add guided decoding support for the SGLang engine and replace
skip_tokenizer_initwithuse_sglang_tokenizerfor tokenizer selection.Details:
SGLang's
GrammarManagerdisablesgrammar_backendwhenskip_tokenizer_init=True(ref), which makes guided decoding impossible. To fix this:skip_tokenizer_init=Trueoverride. SGLang still supports both token-based and text-based inputs without it.skip_tokenizer_initreferences withuse_sglang_tokenizerthroughout handlers and registration logic._get_guided_decoding_params()helper inBaseWorkerHandlerto extractjson_schemafromguided_decodingrequest params and forward them to SGLang's sampling params.DecodeWorkerHandlerandPrefillWorkerHandler.Where should the reviewer start?
components/src/dynamo/sglang/request_handlers/handler_base.py– new_get_guided_decoding_params()methodcomponents/src/dynamo/sglang/args.py– removal ofskip_tokenizer_initcomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.py– guided decoding integration andskip_tokenizer_inittouse_sglang_tokenizerswitchReproducer
Start the backend:
Send a request with JSON schema:
Before fix:
After fix:
{ "response": "Hello! How can I assist you today?" }Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit